An introduction to coding in R
Goals
Setup R to work as you intend
Code in a reproducible, clear style
Gain a familiarity with key data wrangling functions
Become well versed in creating figures with ggplot
Create a document that should help you in the future
Setting up
Make your interface look nice:
Projects
File > New project
Existing directory: places project in an existing folder
New directory: creates new folder
Version control: handy if you want to use github
Projects are powerful:
R knows where to look for files
No need to worry about setting working directories
Great for sharing
Quarto
Open a quarto document in your new project
File > New File > Quarto document
Save the document within the project directory (where you already are)
Save the _quarto.yml provided in the email within this directory
Render the document
Packages
The base version of R can be upgraded with packages
We shall use the tidyverse collection of packages.
#install.packages("tidyverse")
#install.packages("pander")
#install.packages("patchwork")
#install.packages("MetBrewer")
library (tidyverse) # for tidy coding
library (tinytable) # for nice tables
library (patchwork) # for aligning plots
library (MetBrewer) # for nice colours to use when making figures
library (ggridges) # for nice density plots
Rhamphorhynchus muensteri
The dataset
1 pterosaur_data <-
2 read_delim ("pterosaur_data.csv" , delim = ";" )
3 pterosaur_data
1
Assign a name to the dataset
2
Load the csv file and specify the separator between columns
3
Display the dataset
# A tibble: 138 × 15
Individual_ID ORBIT SKULL NECK TRUNK_LENGTH TAIL HUMERUS RADIUS
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 11 40 21.5 47.5 106. 16.5 26.7
2 2 10 35 18 NA 112 15.3 26
3 3 NA NA NA 36 85 NA NA
4 4 NA 36 NA 39 115 17 28
5 5 NA NA 15.5 36.5 100 14.6 23.8
6 6 12 35 23 40 110 NA NA
7 7 10 35 20 41 106 15.5 24
8 8 NA 31 NA NA NA 14.5 24
9 9 NA NA 17 36 79 NA NA
10 10 12.5 41 23 47 125 19 32
# ℹ 128 more rows
# ℹ 7 more variables: METACARPAL_4 <dbl>, WING_PHALANX_1 <dbl>,
# WING_PHALANX_2 <dbl>, WING_PHALANX_3 <dbl>, WING_PHALANX_4 <dbl>,
# FEMUR <dbl>, TIBIA <dbl>
Rhamphorhynchus
Some intriguing patterns
The key to tidyverse coding: %>%
This weird symbol is called a pipe
pterosaur_data %>% view ()
In common language this means: load the pterosaur data then view it in a new tab
The holy trinity
select(): order, rename or drop columns
filter(): keep or remove specific rows
mutate(): create new columns or edit existing ones
If you ever need help with a function, ? is your friend
# An example of how to get some help
?select
select()
Removing columns you aren’t interested in:
pterosaur_data %>% select (Individual_ID, TAIL)
1 pterosaur_data %>% select (! c (Individual_ID, TAIL))
2 pterosaur_data %>% select (contains ("WING" ))
3 pterosaur_data %>% select (1 , 5 )
1
the ! reverses the statement
2
contains chooses columns with names that contain a pattern
3
Dangerous coding! Avoid.
Changing column names:
pterosaur_data %>% select (Specimen = Individual_ID)
# if you want to keep all other columns
pterosaur_data %>% select (Specimen = Individual_ID, everything ())
# a recommended alternative
pterosaur_data %>% rename (Specimen = Individual_ID)
select() use cases
Create a new dataset that only contains the ID of the individual and wing measurements for phalanxs 2, 3 and 4.
pterosaur_subset <-
pterosaur_data %>%
select (Individual_ID, 11 : 13 )
Returning to the original data, remove the measurements for wing phalanx 2 and 4
pterosaur_subset_2 <-
pterosaur_data %>%
select (! c (WING_PHALANX_2, WING_PHALANX_4))
Why does this cause an error?
pterosaur_data %>% select (contains (WING))
filter()
Choosing rows of interest
Large_data <- pterosaur_data %>% filter (TAIL > 200 )
1 ten_cm_tails <- pterosaur_data %>% filter (TAIL == 100 )
long_tails_and_small_heads <-
pterosaur_data %>% filter (TAIL > 200 & SKULL < 90 )
long_tails_or_small_heads <-
2 pterosaur_data %>% filter (TAIL > 200 | SKULL < 90 )
1
== is needed to filter
2
| indicates or
Dealing with NA values
# remove NAs in single column
pterosaur_data %>% filter (! is.na (ORBIT))
# remove all rows with NAs
pterosaur_data %>% filter_at (vars (2 : 15 ), all_vars (! is.na (.)))
filter() use cases
Find pterosaurs that have longer necks than humerus’
necks_longer_than_forearms <- pterosaur_data %>% filter (NECK > HUMERUS)
Returning to the original data, remove measurements with NA TRUNK_LENGTH values, for individuals with IDs greater than 50
pterosaur_data %>%
filter (! is.na (TRUNK_LENGTH) & Individual_ID > 50 )
Trim the data to only include SKULL lengths between 60 and 90mm
medium_skulls <- pterosaur_data %>% filter (SKULL <= 90 & SKULL >= 60 )
filter() use cases
Find the individuals with the maximum and minimum tail lengths
pterosaur_data %>%
filter (! is.na (TAIL)) %>%
filter (TAIL == max (TAIL) |
TAIL == min (TAIL))
Find the individuals with tail lengths above the mean of the sampled population
pterosaur_data %>%
filter (! is.na (TAIL)) %>%
filter (TAIL > mean (TAIL))
filter(): handy operators
== = equal to
& = and
| = or
! = does not
> = greater than
< = less than
mutate(): modifying existing columns
Let’s change the units of measurement to centimetres
pterosaur_data_cm <- pterosaur_data %>% mutate (ORBIT = ORBIT/ 10 )
Now add 10cm to each orbit measurement (but don’t save this!)
pterosaur_data %>% mutate (ORBIT = ORBIT + 10 )
mutate(): creating new columns
The total length of a wing is roughly the sum of the lengths of the humerus, radius, fourth metacarpal and the four wing phalanxs. With mutate(), we can calculate this and add it to the dataset:
pterosaur_data <-
pterosaur_data %>%
mutate (single_wing_length =
HUMERUS + RADIUS + METACARPAL_4 + WING_PHALANX_1 +
WING_PHALANX_2 + WING_PHALANX_3 + WING_PHALANX_4) %>%
select (Individual_ID, single_wing_length, everything ())
Conditional mutation
Can we place individuals into phenotypic classes?
Conditional mutation
pterosaur_data_classes <-
pterosaur_data %>%
mutate (Size_class = case_when (
1 single_wing_length < 300 ~ "Small" ,
2 single_wing_length >= 300 ~ "Large" ,
3 .default = "Unknown" ))
pterosaur_data_classes %>%
select (Individual_ID, Size_class, single_wing_length)
1
For this subset of cases…
2
For a second subset of cases…
3
For all remaining cases…
# A tibble: 138 × 3
Individual_ID Size_class single_wing_length
<dbl> <chr> <dbl>
1 1 Small 183.
2 2 Small 174.
3 3 Unknown NA
4 4 Small 189.
5 5 Small 166.
6 6 Unknown NA
7 7 Small 164.
8 8 Unknown NA
9 9 Unknown NA
10 10 Small 221
# ℹ 128 more rows
Conditional mutation
It’s also possible to mutate a single row
pterosaur_data %>%
mutate (Sex = case_when (Individual_ID == 1 ~ "Special" ,
.default = "Ordinary" ))
Build your phenotypic classes
Not every individual has a recorded wing length.
But there are other morphological traits in the dataset
Create a classification criteria and implement it
Phenotypic classes
pterosaur_data_classes <-
pterosaur_data %>%
mutate (Size_class = case_when (single_wing_length < 300 ~ "Small" ,
single_wing_length >= 300 ~ "Large" ,
WING_PHALANX_1 < 80 ~ "Small" ,
WING_PHALANX_1 >= 80 ~ "Large" ,
FEMUR < 20 ~ "Small" ,
FEMUR >= 20 ~ "Large" ,
TAIL < 150 ~ "Small" ,
TAIL >= 200 ~ "Large" ,
SKULL < 80 ~ "Small" ,
SKULL >= 100 ~ "Large" ,
HUMERUS < 25 ~ "Small" ,
HUMERUS >= 30 ~ "Large" ,
SKULL < 90 ~ "Small" ,
SKULL >= 90 ~ "Large" ,
.default = "Unknown" )) %>%
select (Individual_ID, Size_class, everything ())
Bonus content: summarise()
The logic of mutate() can be extended to summarise() row values
Rows can be grouped to summarise conditionally using the group_by function
pterosaur_data_classes %>%
group_by (Size_class) %>%
1 summarise ("Wing length" = mean (single_wing_length, na.rm = T))
1
mean() has a built-in way to deal with NA values
# A tibble: 3 × 2
Size_class `Wing length`
<chr> <dbl>
1 Large 490.
2 Small 197.
3 Unknown NaN
Your task
Split pterosaurs into phenotypic classes and remove those you can’t categorise
Trim the dataframe to only include class, skull, length, wing length and tail length
Summarise the data to show the mean for morphological traits, for each class
Convert to cm and round to zero decimal places
Focus on writing clear code, with comments (using the #) accompanying each important step.
Hint: the round() function can be used inside mutate()
Table making
Once complete, pass your polished dataframe to this function with the %>% to make a neat table
# your dataframe goes here %>%
1 tt ()
The tidyverse ‘for loop’
In the tidyverse, the accumulate() function replaces the for loop. Here’s how it works
Say we want to work out debt accumulated on a loan over a 6 month period
The monthly interest rate is 10%, or if you’re unlucky 20%
tibble (month = 0 : 6 ,
debt_10 = accumulate (1 : 6 , .init = 100 , ~ .x* 1.1 ),
debt_20 = accumulate (1 : 6 , .init = 100 , ~ .x* 1.2 ))
~ .x * 1.1 adds 10% to each element, starting from the initial value 100
accumulate() use cases
Imagine now that instead of money, you need to track the number of pterosaurs in a population. That population starts off with 400 individuals and decreases by 5% each year. Track the population for 25 years.
tibble (year = 0 : 25 ,
pterosaurs = accumulate (1 : 25 , .init = 400 , ~ .x* 0.95 ))
Now consider migration. Each year, 10 individuals enter the population from a neighbouring source population.
tibble (year = 0 : 25 ,
pterosaurs = accumulate (1 : 25 , .init = 400 , ~ (.x* 0.95 ) + 10 ))
Visualising data
At its core, science communication is most effective through visual mediums
The ggplot2 package is included in the tidyverse
ggplot()
Build plots one layer at a time
Layers are added on top of one another
New layers are added with the + symbol
+ == %>% in ggplot-land
Getting started
pterosaur_data_classes %>%
ggplot (aes ())
ggplot() provides an empty canvas
aes determines how variables are mapped to visual aesthetics
Building a geom_histogram()
pterosaur_data_classes %>%
ggplot (aes (x = SKULL/ 10 , fill = Size_class)) +
geom_histogram (binwidth = 0.1 )
Fix the labels
pterosaur_data_classes %>%
ggplot (aes (x = SKULL/ 10 , fill = Size_class)) +
geom_histogram (binwidth = 0.1 ) +
labs (x = "Skull length (cm)" , y = "No. individuals" , fill = "Size class" )
Fix the theming
pterosaur_data_classes %>%
ggplot (aes (x = SKULL/ 10 , fill = Size_class)) +
geom_histogram (binwidth = 0.1 ) +
labs (x = "Skull length (cm)" , y = "No. individuals" , fill = "Size class" ) +
theme_classic () + # new
theme (panel.grid.major = element_line (), # new
text = element_text (size= 14 )) # new
Fix the axis
pterosaur_data_classes %>%
ggplot (aes (x = SKULL/ 10 , fill = Size_class)) +
geom_histogram (binwidth = 0.1 ) +
labs (x = "Skull length (cm)" , y = "No. individuals" , fill = "Size class" ) +
scale_x_continuous (expand = c (0 , 0 ), # new
breaks = c (0 , 4.0 , 8.0 , 12.0 , 16.0 , 20.0 ), # new
limits = c (0 , 20.0 )) + # new
scale_y_continuous (expand = c (0 , 0 )) + # new
theme_classic () +
theme (panel.grid.major = element_line (),
text = element_text (size= 14 ))
Change the colours
Check out met.brewer
pterosaur_data_classes %>%
ggplot (aes (x = SKULL/ 10 , fill = Size_class)) +
geom_histogram (binwidth = 0.1 ) +
labs (x = "Skull length (cm)" , y = "No. individuals" , fill = "Size class" ) +
scale_x_continuous (expand = c (0 , 0 ),
breaks = c (0 , 4.0 , 8.0 , 12.0 , 16.0 , 20.0 ),
limits = c (0 , 20.0 )) +
scale_y_continuous (expand = c (0 , 0 )) +
scale_fill_manual (values = c (met.brewer ("Monet" )[2 ], met.brewer ("Monet" )[8 ])) + # new
theme_classic () +
theme (panel.grid.major = element_line (),
text = element_text (size= 14 ))
Change to geom_density()
pterosaur_data_classes %>%
ggplot (aes (x = SKULL/ 10 , fill = Size_class)) +
geom_density (colour = NA , alpha = 0.7 ) + # new
labs (x = "Skull length (cm)" , y = "No. individuals" , fill = "Size class" ) +
scale_fill_manual (values = c (met.brewer ("Monet" )[2 ], met.brewer ("Monet" )[8 ])) +
scale_x_continuous (expand = c (0 , 0 ),
breaks = c (0 , 4.0 , 8.0 , 12.0 , 16.0 , 20.0 ),
limits = c (0 , 20.0 )) +
scale_y_continuous (expand = c (0 , 0 )) +
theme_classic () +
theme (panel.grid.major = element_line (),
text = element_text (size= 14 ))
The scatterplot: geom_point()
pterosaur_data_classes %>%
ggplot (aes (x = ORBIT, y = WING_PHALANX_1)) +
geom_point ()
Make improvements
pterosaur_data_classes %>%
ggplot (aes (x = ORBIT/ 10 , y = WING_PHALANX_1/ 10 )) +
geom_point () +
labs (x = "Orbit length (cm)" , y = "First wing phalanx \n length (cm)" ) +
scale_fill_manual (values = c (met.brewer ("Monet" )[2 ], met.brewer ("Monet" )[8 ])) +
scale_x_continuous (expand = c (0 , 0 ), limits = c (0 , 5 )) +
scale_y_continuous (expand = c (0 , 0 ), limits = c (0 , 22 )) +
theme_classic () +
theme (panel.grid.major = element_line (),
text = element_text (size= 14 ))
geom_point()
pterosaur_data_classes %>% filter (Size_class != "Unknown" ) %>%
ggplot (aes (x = ORBIT/ 10 , y = WING_PHALANX_1/ 10 )) +
geom_point (aes (fill = Size_class), shape = 21 , size = 5 , alpha = 0.8 ,
colour = "black" ) +
labs (x = "Orbit length (cm)" , y = "First wing phalanx \n length (cm)" ,
fill = "Size class" ) +
scale_fill_manual (values = c (met.brewer ("Hiroshige" )[4 ],
met.brewer ("Hiroshige" )[6 ])) +
scale_x_continuous (expand = c (0 , 0 ), limits = c (0 , 5 )) +
scale_y_continuous (expand = c (0 , 0 ), limits = c (0 , 22 )) +
theme_classic () +
theme (panel.grid.major = element_line (),
text = element_text (size= 14 ))
Continuous vs discrete colours
pterosaur_data_classes %>% filter (Size_class != "Unknown" ) %>%
ggplot (aes (x = ORBIT/ 10 , y = WING_PHALANX_1/ 10 )) +
geom_point (aes (fill = ORBIT/ 10 ), shape = 21 , size = 5 , alpha = 0.9 ,
colour = "black" ) +
labs (x = "Orbit length (cm)" , y = "First wing phalanx \n length (cm)" ,
fill = "Orbit \n length" ) +
scale_fill_gradientn (colors= met.brewer ("Hiroshige" , direction = - 1 )) +
scale_x_continuous (expand = c (0 , 0 ), limits = c (0 , 5 )) +
scale_y_continuous (expand = c (0 , 0 ), limits = c (0 , 22 )) +
theme_classic () +
theme (panel.grid.major = element_line (),
text = element_text (size= 14 ))
Line plots
Let’s return to our crippling interest rates
line_plot <-
tibble (month = 0 : 20 ,
` 10% ` = accumulate (1 : 20 , .init = 5 , ~ .x* 1.1 ),
` 20% ` = accumulate (1 : 20 , .init = 5 , ~ .x* 1.2 )) %>%
pivot_longer (2 : 3 , names_to = "Interest_rate" , values_to = "Amount_owed" ) %>%
# plot
ggplot (aes (x = month, y = Amount_owed,
group = Interest_rate, colour = Interest_rate)) +
geom_line (linewidth = 2 )
line_plot
Make it beautiful
line_plot +
labs (x = "Months" , y = "Dollars owed" ,
colour = "Interest rate" ) +
scale_colour_manual (values = c (met.brewer ("VanGogh3" )[2 ],
met.brewer ("VanGogh3" )[6 ])) +
scale_x_continuous (expand = c (0 , 0 ), limits = c (0 , 20 )) +
scale_y_continuous (expand = c (0 , 0 ), limits = c (0 , 210 )) +
theme_classic () +
theme (panel.grid.major = element_line (),
text = element_text (size= 28 ))
General tips
alpha: changes the transparency
fill: colours the inside of elements
colour: colours the outlines of elements
If you want these to change with your data, place them inside aes()
Many more plot styles can be found here